Water Observations from Space (WOFS)¶

This notebook demonstrates the Australian Water Observations from Space (WOFS) algorithm. This water detection algorithm is an improvement over the Landsat QA water flag or the NDWI index for water identification. For more information, visit this website:

http://www.ga.gov.au/scientific-topics/hazards/flood/wofs

Import Dependencies and Connect to the Data Cube ▴¶

In [1]:
import datacube
dc = datacube.Datacube(app='Water_Observations_from_Space')
from datacube.utils import masking

import sys, os
os.environ['USE_PYGEOS'] = '0'

import datetime
import matplotlib.pyplot as plt
import numpy as np  
import xarray as xr
import pandas as pd

from dea_tools.plotting import rgb, display_map
from dea_tools.bandindices import calculate_indices

### EASI tools
sys.path.append(os.path.expanduser('../scripts'))
from ceos_utils.data_cube_utilities.clean_mask import landsat_clean_mask_invalid, landsat_qa_clean_mask
from easi_tools import EasiDefaults
from easi_tools import notebook_utils
easi = EasiDefaults() # Get the default parameters for this system
Successfully found configuration for deployment "eail"
In [2]:
cluster, client = notebook_utils.initialize_dask(use_gateway=False)
display(cluster if cluster else client)
print(notebook_utils.localcluster_dashboard(client, server=easi.hub))

LocalCluster

dc491648

Dashboard: http://127.0.0.1:8787/status Workers: 4
Total threads: 8 Total memory: 29.00 GiB
Status: running Using processes: True

Scheduler Info

Scheduler

Scheduler-50b81fc1-2914-4666-9962-019e057aa886

Comm: tcp://127.0.0.1:39431 Workers: 4
Dashboard: http://127.0.0.1:8787/status Total threads: 8
Started: Just now Total memory: 29.00 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:46871 Total threads: 2
Dashboard: http://127.0.0.1:33225/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:42549
Local directory: /tmp/dask-worker-space/worker-1nr2bg8i

Worker: 1

Comm: tcp://127.0.0.1:43341 Total threads: 2
Dashboard: http://127.0.0.1:41729/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:43655
Local directory: /tmp/dask-worker-space/worker-6nokibsf

Worker: 2

Comm: tcp://127.0.0.1:34541 Total threads: 2
Dashboard: http://127.0.0.1:41375/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:46719
Local directory: /tmp/dask-worker-space/worker-8no4s6yt

Worker: 3

Comm: tcp://127.0.0.1:35059 Total threads: 2
Dashboard: http://127.0.0.1:33327/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:42511
Local directory: /tmp/dask-worker-space/worker-czog3fe9
https://hub.eail.easi-eo.solutions/user/jhodge/proxy/8787/status
In [3]:
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True, client=client)
Out[3]:
<botocore.credentials.Credentials at 0x7f6aa43ee6b0>

Choose Platforms and Products ▴¶

In [4]:
# Define the Product
product = "landsat8_c2l2_sr"

Define the Extents of the Analysis ▴¶

In [5]:
# Select an analysis region (Latitude-Longitude) 
# Select a time period within the extents of the dataset (Year-Month-Day)

# Mombasa, Kenya
# latitude = (-4.05, -3.95) 
# longitude = (39.60, 39.68) 

# latitude=easi.latitude
# longitude=easi.longitude
latitude = (36.28, 36.48)
longitude = (-114.325, -114.43)

# Define Time Range
# Landsat-8 time range: 07-Apr-2013 to current
time_extents = ('2021-01-01', '2021-12-31')
In [6]:
# The code below renders a map that can be used to view the analysis region.
display_map(longitude,latitude)
/env/lib/python3.10/site-packages/dea_tools/plotting.py:313: FutureWarning: This function is deprecated. See: https://pyproj4.github.io/pyproj/stable/gotchas.html#upgrading-to-pyproj-2-from-pyproj-1
  all_longitude, all_latitude = transform(Proj(crs), Proj('EPSG:4326'), all_x,
Out[6]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Load and Clean Data from the Data Cube ▴¶

After loading, you will view the Xarray dataset. Notice the dimensions represent the number of pixels in your latitude and longitude dimension as well as the number of time slices (time) in your time series.

In [7]:
measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa']
data_names = measurements.copy()
data_names.remove('pixel_qa')
In [8]:
landsat_dataset = dc.load(latitude = latitude,
                          longitude = longitude,
                          time = time_extents,
                          product = product,
                          output_crs = 'epsg:6933',
                          resolution = (-30,30),
                          measurements = measurements,
                          group_by = 'solar_day',
                          dask_chunks = {'time':1}) 
In [9]:
landsat_dataset
Out[9]:
<xarray.Dataset>
Dimensions:      (time: 23, y: 689, x: 338)
Coordinates:
  * time         (time) datetime64[ns] 2021-01-13T18:15:45.783864 ... 2021-12...
  * y            (y) float64 4.353e+06 4.353e+06 ... 4.332e+06 4.332e+06
  * x            (x) float64 -1.104e+07 -1.104e+07 ... -1.103e+07 -1.103e+07
    spatial_ref  int32 6933
Data variables:
    red          (time, y, x) uint16 dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
    green        (time, y, x) uint16 dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
    blue         (time, y, x) uint16 dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
    nir          (time, y, x) uint16 dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
    swir1        (time, y, x) uint16 dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
    swir2        (time, y, x) uint16 dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
    pixel_qa     (time, y, x) uint16 dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
Attributes:
    crs:           epsg:6933
    grid_mapping:  spatial_ref
xarray.Dataset
    • time: 23
    • y: 689
    • x: 338
    • time
      (time)
      datetime64[ns]
      2021-01-13T18:15:45.783864 ... 2...
      units :
      seconds since 1970-01-01 00:00:00
      array(['2021-01-13T18:15:45.783864000', '2021-01-29T18:15:42.664125000',
             '2021-02-14T18:15:39.346007000', '2021-03-02T18:15:32.931929000',
             '2021-03-18T18:15:23.744977000', '2021-04-03T18:15:20.299577000',
             '2021-04-19T18:15:13.841622000', '2021-05-05T18:15:03.987064000',
             '2021-05-21T18:15:13.720180000', '2021-06-06T18:15:21.649829000',
             '2021-06-22T18:15:26.735389000', '2021-07-08T18:15:28.967279000',
             '2021-07-24T18:15:34.321139000', '2021-08-09T18:15:41.447233000',
             '2021-08-25T18:15:45.998194000', '2021-09-10T18:15:50.800755000',
             '2021-09-26T18:15:54.176376000', '2021-10-12T18:15:59.488427000',
             '2021-10-28T18:16:00.489171000', '2021-11-13T18:15:56.158648000',
             '2021-11-29T18:15:56.195335000', '2021-12-15T18:15:55.088125000',
             '2021-12-31T18:15:49.342773000'], dtype='datetime64[ns]')
    • y
      (y)
      float64
      4.353e+06 4.353e+06 ... 4.332e+06
      units :
      metre
      resolution :
      -30.0
      crs :
      epsg:6933
      array([4352625., 4352595., 4352565., ..., 4332045., 4332015., 4331985.])
    • x
      (x)
      float64
      -1.104e+07 ... -1.103e+07
      units :
      metre
      resolution :
      30.0
      crs :
      epsg:6933
      array([-11040915., -11040885., -11040855., ..., -11030865., -11030835.,
             -11030805.])
    • spatial_ref
      ()
      int32
      6933
      spatial_ref :
      PROJCS["WGS 84 / NSIDC EASE-Grid 2.0 Global",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Cylindrical_Equal_Area"],PARAMETER["standard_parallel_1",30],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","6933"]]
      grid_mapping_name :
      lambert_cylindrical_equal_area
      array(6933, dtype=int32)
    • red
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      epsg:6933
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 10.22 MiB 454.85 kiB
      Shape (23, 689, 338) (1, 689, 338)
      Dask graph 23 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      338 689 23
    • green
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      epsg:6933
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 10.22 MiB 454.85 kiB
      Shape (23, 689, 338) (1, 689, 338)
      Dask graph 23 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      338 689 23
    • blue
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      epsg:6933
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 10.22 MiB 454.85 kiB
      Shape (23, 689, 338) (1, 689, 338)
      Dask graph 23 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      338 689 23
    • nir
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      epsg:6933
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 10.22 MiB 454.85 kiB
      Shape (23, 689, 338) (1, 689, 338)
      Dask graph 23 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      338 689 23
    • swir1
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      epsg:6933
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 10.22 MiB 454.85 kiB
      Shape (23, 689, 338) (1, 689, 338)
      Dask graph 23 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      338 689 23
    • swir2
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      epsg:6933
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 10.22 MiB 454.85 kiB
      Shape (23, 689, 338) (1, 689, 338)
      Dask graph 23 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      338 689 23
    • pixel_qa
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 689, 338), meta=np.ndarray>
      units :
      bit_index
      nodata :
      1
      flags_definition :
      {'snow': {'bits': 5, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'clear': {'bits': 6, 'values': {'0': 'not_clear', '1': 'clear'}}, 'cloud': {'bits': 3, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'water': {'bits': 7, 'values': {'0': 'land_or_cloud', '1': 'water'}}, 'cirrus': {'bits': 2, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'nodata': {'bits': 0, 'values': {'0': False, '1': True}}, 'qa_pixel': {'bits': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'values': {'1': 'Fill', '2': 'Dilated Cloud', '4': 'Cirrus', '8': 'Cloud', '16': 'Cloud Shadow', '32': 'Snow', '64': 'Clear', '128': 'Water', '256': 'Cloud Confidence low bit', '512': 'Cloud Confidence high bit', '1024': 'Cloud Shadow Confidence low bit', '2048': 'Cloud Shadow Confidence high bit', '4096': 'Snow Ice Confidence low bit', '8192': 'Snow Ice Confidence high bit', '16384': 'Cirrus Confidence low bit', '32768': 'Cirrus Confidence high bit'}, 'description': 'Level 2 pixel quality'}, 'cloud_shadow': {'bits': 4, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'dilated_cloud': {'bits': 1, 'values': {'0': 'not_dilated', '1': 'dilated'}}, 'cloud_confidence': {'bits': [8, 9], 'values': {'0': 'none', '1': 'low', '2': 'medium', '3': 'high'}}, 'cirrus_confidence': {'bits': [14, 15], 'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}, 'snow_ice_confidence': {'bits': [12, 13], 'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}, 'cloud_shadow_confidence': {'bits': [10, 11], 'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}}
      crs :
      epsg:6933
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 10.22 MiB 454.85 kiB
      Shape (23, 689, 338) (1, 689, 338)
      Dask graph 23 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      338 689 23
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2021-01-13 18:15:45.783864', '2021-01-29 18:15:42.664125',
                     '2021-02-14 18:15:39.346007', '2021-03-02 18:15:32.931929',
                     '2021-03-18 18:15:23.744977', '2021-04-03 18:15:20.299577',
                     '2021-04-19 18:15:13.841622', '2021-05-05 18:15:03.987064',
                     '2021-05-21 18:15:13.720180', '2021-06-06 18:15:21.649829',
                     '2021-06-22 18:15:26.735389', '2021-07-08 18:15:28.967279',
                     '2021-07-24 18:15:34.321139', '2021-08-09 18:15:41.447233',
                     '2021-08-25 18:15:45.998194', '2021-09-10 18:15:50.800755',
                     '2021-09-26 18:15:54.176376', '2021-10-12 18:15:59.488427',
                     '2021-10-28 18:16:00.489171', '2021-11-13 18:15:56.158648',
                     '2021-11-29 18:15:56.195335', '2021-12-15 18:15:55.088125',
                     '2021-12-31 18:15:49.342773'],
                    dtype='datetime64[ns]', name='time', freq=None))
    • y
      PandasIndex
      PandasIndex(Float64Index([4352625.0, 4352595.0, 4352565.0, 4352535.0, 4352505.0, 4352475.0,
                    4352445.0, 4352415.0, 4352385.0, 4352355.0,
                    ...
                    4332255.0, 4332225.0, 4332195.0, 4332165.0, 4332135.0, 4332105.0,
                    4332075.0, 4332045.0, 4332015.0, 4331985.0],
                   dtype='float64', name='y', length=689))
    • x
      PandasIndex
      PandasIndex(Float64Index([-11040915.0, -11040885.0, -11040855.0, -11040825.0, -11040795.0,
                    -11040765.0, -11040735.0, -11040705.0, -11040675.0, -11040645.0,
                    ...
                    -11031075.0, -11031045.0, -11031015.0, -11030985.0, -11030955.0,
                    -11030925.0, -11030895.0, -11030865.0, -11030835.0, -11030805.0],
                   dtype='float64', name='x', length=338))
  • crs :
    epsg:6933
    grid_mapping :
    spatial_ref
In [10]:
clear_mask = masking.make_mask(landsat_dataset['pixel_qa'], clear='clear')
water_mask = masking.make_mask(landsat_dataset['pixel_qa'], water='water')
cloud_mask = masking.make_mask(landsat_dataset['pixel_qa'], cloud='not_high_confidence', cloud_shadow='not_high_confidence')
clean_mask = (clear_mask | water_mask) & cloud_mask

Time Series Water Detection Analysis ▴¶

Time series output of the Australian Water Observations from Space (WOFS) results. The results show the percent of time that a pixel is classified as water over the entire time series. BLUE = frequent water, RED = infrequent water.

In [13]:
# WOFS is written for Landsat Collection 1, so we need to scale the Collection 2 data to look like collection 1
# This is stolen from https://github.com/GeoscienceAustralia/wofs/blob/master/wofs/virtualproduct.py
def scale_usgs_collection2(data):
    """These are taken from the Fractional Cover scaling values"""
    attrs = data.attrs
    data =  data.apply(scale_and_clip_dataarray, keep_attrs=False,
                       scale_factor=0.275, add_offset=-2000,
                       clip_range=None, valid_range=(0, 10000))
    data.attrs = attrs
    return data

def scale_and_clip_dataarray(dataarray: xr.DataArray, *, scale_factor=1, add_offset=0, clip_range=None,
                             valid_range=None, new_nodata=-999, new_dtype='int16'):
    orig_attrs = dataarray.attrs
    nodata = dataarray.attrs['nodata']

    mask = dataarray.data == nodata

    # add another mask here for if data > 10000 then also make that nodata
    dataarray = dataarray * scale_factor + add_offset

    if clip_range is not None:
        dataarray = dataarray.clip(*clip_range)

    dataarray = dataarray.astype(new_dtype)

    dataarray.data[mask] = new_nodata
    if valid_range is not None:
        valid_min, valid_max = valid_range
        dataarray = dataarray.where(dataarray >= valid_min, new_nodata)
        dataarray = dataarray.where(dataarray <= valid_max, new_nodata)
    dataarray.attrs = orig_attrs
    dataarray.attrs['nodata'] = new_nodata

    return dataarray

landsat_dataset_scaled = scale_usgs_collection2(landsat_dataset)
In [14]:
from ceos_utils.data_cube_utilities.dc_water_classifier import wofs_classify
ts_water_classification = wofs_classify(landsat_dataset_scaled,clean_mask = clean_mask.values, no_data=0, x_coord='x', y_coord='y')
/env/lib/python3.10/site-packages/rasterio/warp.py:344: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  _reproject(
In [15]:
# Apply nan to no_data values
ts_water_classification = ts_water_classification.where(ts_water_classification != -9999).astype(np.float16)

# Time series aggregation that ignores nan values.    
water_classification_percentages = (ts_water_classification.mean(dim = ['time']) * 100).wofs.rename('water_classification_percentages')
In [16]:
# import color-scheme and set nans (no data) to black
from matplotlib.cm import jet_r
jet_r.set_bad('black',1)
In [17]:
img_scale = water_classification_percentages.shape[0]/water_classification_percentages.shape[1]
In [18]:
# This is where the WOFS time series product is generated. 
# Areas of RED have experienced little or no water over the time series
# Areas of BLUE have experience significant or constant water over the time series
figsize=6
water_classification_percentages.plot(cmap = jet_r, figsize=(figsize,figsize*img_scale), vmin=0, vmax=100)
plt.title("Percent of Samples Classified as Water")
plt.axis('off')
plt.show()
/env/lib/python3.10/site-packages/dask/core.py:119: RuntimeWarning: divide by zero encountered in divide
  return func(*(_execute_task(a, cache) for a in args))
/env/lib/python3.10/site-packages/dask/core.py:119: RuntimeWarning: divide by zero encountered in divide
  return func(*(_execute_task(a, cache) for a in args))
/env/lib/python3.10/site-packages/dask/core.py:119: RuntimeWarning: divide by zero encountered in divide
  return func(*(_execute_task(a, cache) for a in args))
/env/lib/python3.10/site-packages/dask/core.py:119: RuntimeWarning: divide by zero encountered in divide
  return func(*(_execute_task(a, cache) for a in args))
In [ ]: